1. Python大數據特訓班_爬取與分析_使用Requests與BeautifulSoup練習簡單爬取

python3 網頁爬蟲 python爬蟲 requests beautifulsoup

Zoey 2019-04-10 12:51:39 ‧ 5824 瀏覽

分享至

今天使用python練習簡單的爬取博客來即時榜
爬取內容:
1.榜單排名
2.書名
3.書的圖片網址

import requests
from bs4 import BeautifulSoup
#博客來即時榜單
url='https://www.books.com.tw/web/sys_hourstop/books?loc=act_menu_th_43_001'
#使用get方式向網頁發送請求
html=requests.get(url)
#使用utf-8方式編碼讀取網頁
html.encoding='utf-8'
#自訂網頁表頭，讓電腦模擬瀏覽器操作網頁，騙過網頁伺服器
headers={'user-agent':'Mozilla/5.0'}
#使用BeautifulSoup解析原始碼
sp=BeautifulSoup(html.text,'lxml')
#讀取網頁內容，找到博客來即時榜的位置範圍
m=sp.select('.mod_no')[0].select('.item')
for i in m:
    #讀取榜單排名
    print("%s"%i.find_all('strong')[0].text,end=' ')
    #讀取書名
    print(i.find_all('h4')[0].text)
    #讀取圖片網址
    print(i.select('img')[0]['src'])

sp=BeautifulSoup(html.text,'lxml')
關於BeautifulSoup解析原始碼這裡
我們前面教學使用的方式是html.parser
但是最近聽說大家推薦 lxml
聽說解析速度會比較快
所以這裡就使用lxml方始製作囉

結果顯示

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙

1. Python大數據特訓班_爬取與分析_使用Requests與BeautifulSoup練習簡單爬取

尚未有邦友留言

標記使用者